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Abstract 

We present nonparametric techniques for constructing and verifying density estimates from high-dimensional 
data whose irregular dependence structure cannot be modelled by parametric multivariate distributions. 
A low-dimensional representation of the data is critical in such situations because of the curse of di- 
mensionality. Our proposed methodology consists of three main parts: (1) data reparameterization via 
dimensionality reduction, wherein the data are mapped into a space where standard techniques can be 
used for density estimation and simulation; (2) inverse mapping, in which simulated points are mapped 
back to the high-dimensional input space; and (3) verification, in which the quality of the estimate is 
assessed by comparing simulated samples with the observed data. These approaches are illustrated via 
an exploration of the spatial variability of tropical cyclones in the North Atlantic; each datum in this 
case is an entire hurricane trajectory. We conclude the paper with a discussion of extending the methods 
to model the relationship between TC variability and climatic variables. 
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1. Introduction 

In the realm of high-dimensional statistics, regression and classification have received much attention, 
while density estimation has lagged behind. Yet, there are compelling scientific questions which can 
only be addressed via density estimation using high-dimensional data and sampling from such estimates. 
Consider the paths of North Atlantic tropical cyclones (TC), some of which are shown in Figure 1. 
Temporarily assuming that tropical cyclones are independent and identically distributed, how would 
one use this data to estimate the probability that the next TC will make landfall at a particular swath 
of coastal North Carolina? Or how can one relate changes in TC paths over time to major climatic 
predictors such as sea surface temperature? If we cast each track as a single high-dimensional data 
point, density estimation allows us to answer such questions via integration or Monte Carlo methods. 

All attempts to perform high-dimensional density estimation (HDDE) will require an element of 
dimensionality reduction to be feasible. Most existing methods, however, suffer from assumptions that 
are not appropriate for the applications presented above. Linear methods, such as Principal Component 
Analysis (PCA)[1], simply project all data points onto a lower-dimensional hyperplane, and are hence 
not able to describe complex, nonlinear variations. More recent work in HDDE has assumed sparsity 
of the input data [2], in the sense that the complex variations in density are a function of only a few 
of the original dimensions used to represent a datum. This is not typical of the data we consider here. 
For example, suppose one assumed that the density of a TC could be described by, say, three points on 
its path. This is conceding the ability to answer the questions described above, as we would have no 
information about the behavior of the track between these three points on the path. 

Thus, there is a need for research on methods for nonparametric, nonlinear HDDE that involves 
dimensionality reduction and yet allows sampling from the original input space. We present an approach 
which utilizes a spectral connectivity analysis (SCA) method [3] called diffusion maps. SCA reparame- 
terizes the data in a way that preserves context-dependent similarity. SCA and other eigenmap methods 
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have been very successful for data parameterization [4, 5, 6], regression [7], and clustering and classifi- 
cation [8, 9, 5, 10]. In this work we extend these successes to a high-dimensional density estimator with 
the potential to address key questions regarding TC behavior, among other applications. 

In what follows, we introduce a new approach to high-dimensional density estimation and sampling. 
The method is illustrated via the analysis of TC data. The basic idea of our approach is to perform 
density estimation in a reduced space using standard methods, to then generate a random sample from the 
lower-dimensional density, and finally to map the sampled data points back to the input space. We also 
describe a statistical method for evaluating the quality of the output of our simulation algorithm. While 
traditional techniques such as Q-Q plots and Kolmogorov-Smirnov tests work in very low-dimensional 
spaces, one needs to confirm that not too much information was lost in the reduction and that the 
original data set and the sample generated by the algorithm can reasonably be thought to come from the 
same distribution. We present a simulation-based approach that is independent of the particular HDDE 
method. 
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Figure 1: Forty randomly selected tracks of the 608 TCs between 1950 and 2006. 



2. Methodology: Motivation and Description 

The low-frequency, high-severity nature of tropical cyclones in the North Atlantic Ocean means that 
important and costly public policy, military, and business decisions are being made on the basis of 
relatively little historical data, and consequently any methodology that can extract more information 
from the data is valuable in advancing scientific, security, and economic interests. Much attention has 
been paid to hypotheses about the effect of various climatic predictors on TC occurrence frequency, 
TC landfall frequency, and TC intensity. However, few people have exploited the relationship between 
climatic predictors and the spatial variation of TCs, i.e. the TC tracks. This is largely due to the 
challenging nature of characterizing these relationships, and not due to a lack of importance. As Xie 
et al. [11] state, in addition to the focus on yearly counts and intensity, "it would be of great benefit to 
society if the preferred paths of hurricanes could also be predicted in advance of the onset of hurricane 
season." Figure 1 illustrates the paths of 40 randomly selected tracks out of the 608 TCs that occurred 
between 1950 and 2006 [12]. 

To the extent that the spatial distribution of TC tracks has been investigated, researchers have 
primarily focused on variability in landfall location. For example, Hall and Jewson [13] address the 
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question of the effect of SST on landfall rates over fairly large regions of coastline; they use a rough- 
grained conditioning scheme which buckets years into "hot years" and "cold years". Xie et al. [11] extend 
beyond landfall considerations and use empirical orthogonal functions to correlate climatic predictors 
with a "hurricane track density function" (HTDF). However, HTDF is somewhat of a misnomer, as the 
object they construct is not a density over tracks but a density over the ocean: the magnitude of the 
HTDF at x corresponds to x's proximity to observed hurricane tracks. The HTDF is really a reduction 
of the density; if one has a density over all tracks, one can construct the probability of a TC passing by a 
particular point by integrating the probability over all tracks which pass by that point, or via simulation. 

The majority of work in track density estimation has adopted a similar approach, working in two 
spatial dimensions: first estimate a genesis (origination) density over the region of interest (e.g. the North 
Atlantic); then estimate a series of Markovian densities of track propagation, usually corresponding to 
6-hour steps in which the distribution of the next location is a function of only the previous location; 
finally couple this with a lysis (death) component so that the simulated hurricane eventually stops 
[14, 15, 16, 17]. For example, Vickery et al. [17] uses the following model for changes in translation speed 
c and direction 9 of a TC from time i to i + 1: 

Alnc = a\ + <22?/> + a3 A + 04 lnc^ + a^Oi + e 
A6 = bi + b 2 ip + b 3 X + biCi + 65^ + 6 6 0»-i + $ 

where ip and A are latitude and longitude and e and 5 are error terms. In addition, to model spatial 
variability the parameters ai , 0,2 , ■ ■ ■ , a.5, b\ , i>2> • • • , be vary over each box in a 5° x 5° grid over the Atlantic 
Ocean. Clearly, a primary drawback to this approach is the proliferation of parameters to estimate and 
models to validate. 

To ground the explanation of the general methodology laid out in this paper, we will present its 
application to the task of estimating the density of the 608 TC tracks between 1950 and 2006, forty of 
which are shown in Figure 1. 



2.1. Dimensionality Reduction 

The first step in our algorithm is to perform dimensionality reduction and reparameterize the data 
using a SCA technique called diffusion maps [4, 5]. Diffusion maps rely on a metric that quantifies the 
"connectivity" of a data set and introduces a new coordinate system based on this metric. 



Diffusion Maps 

The construction begins by creating a weighted graph G — (ft, W) where the nodes in the graph are the 
n observed data points in R d , e.g. the trajectories of TCs discretized to d spatial locations. The weight 
given to the edge connecting two data points x e ft and y e £1 is w(x,y) = cxp(— A 2 (x, y)/e), where 
A(a;, y) is an application-specific locally relevant distance measure and e controls the neighborhood size. 
We construct a Markov random walk on the graph where the probability of stepping directly from x to 
y is p\{x,y) — w{x, y)/ ^ z w{x, z). This probability will be small unless the two points (such as two 
TCs) are similar to each other. We then iterate the walk for t steps, and consider pt(x, •), the conditional 
distribution after t steps having started at x. One natural way to think of two points as similar is if their 
i-step conditional distributions are close; formally, we define the "diffusion distance at scale t" as: 

Dfr,y) = ^ t{x > Z \-/ z f y > z))2 . (i) 

The stationary distribution ^o(') of the random walk penalizes discrepancies on domains of low density. 

One can show there is a mapping — the diffusion map — which reduces the dimension while still ap- 
proximating diffusion distance. Each high-dimensional data point x can be mapped to reduced dimension 
m via the following transformation: 

* t :z^[A^i(aO,A 2 V> 2 (z),...,A^VUz)] (2) 

where \j and ipj represent the eigenvalues and right eigenvectors of P — {p(x, y)} x , y , the row-normalized 
similarity matrix. One can show that 

n— 1 m 

D\{x,y) = ]T xf(^(x) Uv)f « E *T(M*) Uv)) 2 - Mv)\\ 2 - (3) 
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(a) A subset of the observed tracks. 



(b) The observed tracks in diffusion space. 




(c) A density over diffusion space 



° ctt cP^p ° o oo O 




(d) A random sample from the density. 




(e) A subset of the sample mapped back into 
track space. 



Figure 2: An overview of the dimensionality-reduction approach to TC track simulation, (a) shows 40 randomly 
selected tracks out of a total of 608 TCs observed between 1950 and 2006. (b) shows all 608 tracks mapped to diffusion 
space for m = 2 and e = 430, with each point corresponding to a particular track in (a). (Although the analysis in this 
paper works with m = 3, we use the two-dimensional map here to be able to visualize the whole process.) An estimated 
density for the diffusion space data of (b) is visualized in (c), and a 608-element sample from that density is shown in (d). 
Each point in the sample can be interpreted as being associated with a new, as-yet- unobserved track. The sample is finally 
mapped back into track space; 40 randomly selected TCs of the sample are shown in (e). 



In other words, the Euclidean distance between two nodes in the reduced space approximates their 
diffusion distance, which in turn reflects the connectivity of the data as defined by a i-step random walk. 
Richards et al. [7] demonstrate that the diffusion coordinates can be used to meaningfully model physical 
information in a regression setting. 

Here we focus on pure spatial similarity: Do the TCs follow similar paths? We begin by regularizing 
each track, transforming it from its original representation with a varying number of points, which 
represent the position of the TC in six-hour increments, to a new representation with thirteen regularly- 
spaced points. We define the distance measure A(x,y) to be the sum of the Euclidean distance between 
the 13 corresponding points on each track pair (see Figure 2.1) and we use the dimension m = 3, the 
selection of which is described at the end of Section 2.3. We stress that A(-, •) need not be Euclidean 
distance; we can use an application-specific distance measure. One could, for example, imagine a different 
project in which two TCs are considered to be similar not only if their tracks are spatially similar but 
also if their intensity histories are similar. 
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Figure 3: Two regularized tracks are shown above: the original 6-hour segments are marked by circles, and the regularized 
segments are marked by X . The application-specific distance measure is the sum of the distances between 13 corresponding 
pairs of points, shown as dashed lines. 

The issue of how to select the parameters 8 = (e, t) in SCA is usually an open problem. Below we 
present an approach that is closely connected to the so-called Nystrom extension of the map. 

Nystrom Extension and Parameter Selection 

As the diffusion map is defined only on the points in the graph, we need a technique to project new points 
into the map. In other words, we wish to extend ipi, the right eigenvectors of the transition matrix, to a 
new value y. We know that for Vxi,X2 £ ^, 

^2 p(xi,x 2 )^j(x 2 ) = \jipj(xi). (4) 
Hence, we can approximate an extension to y by replacing x\ with y: 

y Piv^^ji^) = 4>j{y)- (5) 

3 x 2 en 
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This leads to the extended diffusion map 
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We also need to extend the transition matrix to have a y-row, but this can be done exactly: simply apply 
the distance function to y and all members of Q, and then normalize the row. 

With the extension, we can now perform cross-validation of a collection of parameters, 0: 

1. Hold out observation Xi from the test set. 

2. Compute *(_,) ; e ) the diffusion map computed with all observations but x^. 

3. Use the Nystrom extension to find *(_j) ; e(a;i), the projection of Xi into diffusion space. 

4. Find Xi = &7_iyQ(^(—i)-e{%i)), the pre-image of the projection of Xi. 

5. Calculate A(xi,Xi), the distance between the true track and its approximation. 

6. Repeat for all observations, returning ^\ A(xi, £i) as the error for 6. 

7. Repeat for all candidate values of O. 

8. Return as the model parameters the value of whose error is smallest. 

Given the Nystrom extension, all the steps are straight-forward except for step 4. How do we find 
pre-images of points in diffusion space? As this is an important step later in the algorithm as well, let 
us consider it more generally: let £ be the point in diffusion space we want to find a pre-image for. As 
emphasized in Arias et al. [18], we search for the point in the original space whose extension into diffusion 
space comes closest to ( in Euclidean distance. In other words, we want to select 



The implementation of the pre-image search will be context-specific; we defer the discussion of the pre- 
image search for the TC density until Section 2.2. Figure 2(b) shows a two-dimensional diffusion map 
of the TC data. 

Density Estimation and Random Sampling 

Once the data are mapped into the low-dimensional space, a range of nonparametric density estimators 
could be employed. In our initial work, we use k-nearest-neighbor kernel density estimation (rather than 
density estimation with a fixed kernel bandwidth), because of the tendency for data points to cluster near 
an apparent "boundary" in diffusion space. Also, using this estimator, simulation is trivial [19]. Figures 
2(c) and 2(d) show an example of density estimation and random sampling for TC data. Future work 
will focus on fitting models which allow for natural incorporation of covariates and time dependence; see 
Equation 11 in Section 3. 

2.2. Inverse Mapping 

In order to verify the validity of our density estimate, and also to be able to simulate new sample tracks, 
we need a method for finding the pre-image of an arbitrary point in diffusion space. As mentioned, 
using the Nystrom extension and pre-image objective of Equation 7 there is a natural approach: the 
pre-image can be approximated as the track whose projection into diffusion space comes closest to the 
point that we wish to invert. In practice, however, designing a search mechanism that is both sufficiently 
exhaustive and computationally feasible is difficult. One solution is to restrict the pre-image to be 
a convex combination of observed data objects, assuming they are of the same dimension (or can be 
approximated as such in a meaningful way) [20, 21]. This is the approach we describe here. 

Let C be the point in diffusion space for which we seek the pre-image. The Euclidean distance from 
C to *$/(x) for each observed track x E ft is a natural measure of the similarity between \I> _1 (C) and x. 
Thus, we can construct weights as 
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and then use these weights in constructing the convex combination: 

The problem has thus been reduced to searching over a single parameter which controls the spread 
of the normal kernel used to determine the weights. This approach does require that each track be 
condensed to an equal number of points along its path, but that number need not be small in order for 
this to be feasible. Furthermore, note that in practice it will typically only be necessary to interpolate 
between tracks which are similar, i.e. if w((, x) and w{Q, y) are both large, then x and y will usually be 
alike. 

However, constructing inverses as convex combinations of observed tracks produces simulated tracks 
that are never more extreme than the most extreme observed track for whichever spatial measurement one 
might want to consider. For example, using this approach, no pre-image could be longer than the longest 
of the observed tracks or shorter than the shortest of observed tracks. To overcome this shortcoming, in 
addition to searching over a^, we also allow the average to stretch up to 150% and shrink down to 75%. 
In other words, we do not search for merely the convex combination whose projection comes closest 
to r], but we treat the convex combination as a form which can be stretched (in both directions) by 
the optimal factor in [.75,1.5]. Furthermore, we consider separately a shrink/stretch anchored at the 
origination point and the lysis point of the convex combination. Despite these extensions, one remaining 
shortcoming is that it is not possible to sample a track with more loops than any of the observed tracks, 
although it is possible for such a TC to occur. A set of 608 tracks simulated using our algorithm is shown 
in Figure 2(e). Other variations on the pre-image construction can and will be explored. 

More importantly, we are developing ways of testing the validity of the entire analysis pipeline, as 
summarized in Procedure 1. Our approach to assessment relies upon comparing the observed tracks to 
simulated tracks, which is described next. 



Procedure 1 High-dimensional density estimation 

Input: ft, a high-dimensional data set of dimension d and size n; 

A : R d x R d — > M, a locally-relevant distance measure. 
Output: f2, a sample from the estimated density of 0. 
1: Dimensionality reduction: 

2: Construct an m-dimensional diffusion map for f2, A: 
3: Select model parameters e and t via cross-validation. 

4: Perform density estimation in diffusion space to form ju. 
5: Generate a size-n random sample from fl. 
6: Inverse mapping: 

7: Find il = <I'~ 1 (<I>), the pre-image of <J> in the d-dimensional input space. 
8: Validate results via repeated simulation from % 



2.3. Validation 

Although best efforts at verification were made in each stage of the above algorithm — cross-validating 
the diffusion map parameters, selecting reasonable bandwidths in the density estimation — we would 
like to have a procedure for making a comprehensive evaluation of the resulting estimate. Our approach 
will be based on a test of the hypothesis that two high-dimensional samples — the observed data and 
simulated data — come from the same underlying distribution. 

We are pursuing results which will prove consistency of the density estimator, but this will be chal- 
lenging, in part because it will require a new analysis every time a part of the algorithm is modified (for 
example, if one wanted to move from kernel density estimation to locally linear density estimation) . Here 
we produce a nonparametric high-dimensional verification technique that treats the particulars of the 
methodology as a black box. We assess whether a new sample can reasonably be said to come from the 
same distribution as the observed data, regardless of how the former was generated. This is analogous 
to existing tools for one-dimensional analysis (Q-Q plots, the Wilcoxon rank-sum test, the two-sample 
Kolmogorov-Smirnov test). While there are multivariate extensions to some of these classic tests [22], 
these methods often struggle with extensions beyond two dimensions. We utilize a simple test statistic 
similar to that given in Hall and Tajvidi [23], which allows for genuine high-dimensional comparisons, 
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and also yields a visual assessment tool for helping to identify, and therefore possibly correct, the ways 
in which the simulation fails. 

We make a connection between the choice of the local distance metric A and the validation of the 
density estimate; in practice, this connection can be used in motivating the choice of A. Formally, let 
fix and fi2 be two distributions over the input space and let X\, X2, ■ ■ ■ , X n be i.i.d., distributed as 
and let Y\, Y2, . . . , Y n be i.i.d., distributed as \i2- Define the quantity £a(mi, M2) to be the proportion of 
the values 

(X 1 ,X 2 ,...,X n ,Y 1 ,Y 2 ,...,Y n ) (9) 

whose nearest neighbor (as measured by A) is from the same sample. Let 1Z a{hi, M2) = E (£a(mij ^2))- 
We define a density estimator to be consistent with respect to local distance metric A if 

lim U&(ji n ,nx) = 0.5, (10) 

n^oc 

where /2 n is the estimated distribution, and nx is the true distribution. Heuristically, if the two distri- 
butions pi n and fix are the same, then the nearest neighbor of any sample value is equally likely to be 
from either of the two samples. 

A simulated test 

In practice, how can one use the motivation behind the more formal notion of consistency with respect to 
the local distance metric to produce a test of our sampling mechanism? Noting that all samples generated 
by the algorithm will be from the same distribution, we can used a simulation-based approach: 

1. For some large number k, generate k pairs of samples of size n using the algorithm. 

2. For the i th pair, calculate and record Ca- 

3. Generate one last sample of size n using the algorithm and pair it with the observed tracks; calculate 
I* = Ca for these values. 

4. Evaluate where I* falls in the distribution of the k proportions; reject the hypothesis that the 
observed tracks come from the estimated density if it is too far in the tails. 

This test can be adapted to any sampling mechanism, not just the one presented in this paper. Note that 
rejecting the null hypothesis is not always indicative of a problem. In addition to verifying a sampling 
method, this technique can be used to find and hone in on regions of dissimilarity among samples that 
one expects to differ - Section 3 provides an example. 

When this test was applied to the observed tracks and the sample shown in Figure 2(e), there were 
689 within-sample nearest neighbors, for a proportion of ^||g = 0.56. For k = 1000, there were 31 
pairs whose within-sample NN proportion was higher, despite being from the same distribution. This 
indicates that there is room for improvement in the steps of our algorithm, but the simulated data are 
fairly similar to the observed data. Given that we used only m — 3, this is quite encouraging. 

Visual assessment 

In a Q-Q plot one can often immediately diagnose the nature of dissimilar samples (for example, one 
sample having heavier tails than another). However, as emphasized in Hall and Heckman [24], it is much 
harder to craft visualization methods for high-dimensional data, as they tend to be co-dimensional with 
the data. But if one of the samples is created via a method that involves dimensionality reduction, this 
can be used in conjunction with the local distance metric to provide a quick visual gauge of the region 
of dissimilarity. 

We plot both the original data and the sampled data in diffusion space, distinguishing the points not 
based on which sample they came from but by whether their nearest neighbor is of the same or different 
sample. One might be able to visually identify a region in the diffusion map which is too saturated with 
data points who have within-sample nearest neighbors. Of course, this will not provide as easy an answer 
as "different thickness of tails" or other causes of one-dimensional dissimilarity, but by inspecting the 
data points in the saturated region in their higher-dimensional representation, it can provide one with 
tools for generating hypotheses on why the simulation is insufficient which a single test statistic would 
not be able to do. 
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Choice of dimension 

Note that there is a tradeoff between the dimensionality reduction - in which a larger m improves the 
results by retaining more information - and the density estimation - in which a larger m makes density 
estimation more difficult. In selecting m = 3 we used a criterion that encompasses all components of 
the method (as opposed to selecting m at the same time that e and t are estimated). In particular, to 
evaluate a fixed choice of m, proceed with Procedure 1; after performing the density estimation, generate 
100 sets of simulated tracks of size n = 608. For O, the set of observed tracks, and Si, the i th set of 
simulated tracks, find C^{0,S{). Average these 100 ratios. Select the dimension whose average ratio 
is closest to 0.5 as the optimal dimension. For this application we considered m £ {2,3,4}, this kernel 
density estimation in greater than four dimensions is not reliable with only 608 observations; in Section 
3.1 we discuss a form of orthogonal series density estimation that we expect will perform well in higher 
dimensions. 

3. Conditional Density Estimation 

Some of the most important questions regarding TCs could be addressed, at least partially, through a 
better understanding of the relationship between TC occurrence and other measurable characteristics of 
the climate system. Such relationships could be utilized in, for instance, creating and verifying complex 
simulation models, predicting future trends in TC activity, and understanding human influence on the 
climate system. Specifically, an area of great concern is the effect that rising sea surface temperatures 
(SST) might have on the frequency and/or intensity of TCs. The simulation method introduced in this 
paper can be applied to the question of how changes in sea surface temperature might affect the landfall 
distribution of TCs. 

First consider a set-up similar to Hall and Jewson [13]: they focused on the 19 hottest and 19 coldest 
years from 1950 to 2005, where a year's "temperature" was defined as the July- August-September SST 
averaged over a region of the Atlantic. After dividing the North American continental coastline into six 
major segments — the U.S. Northeast, the U.S. mid-Atlantic, Florida, the U.S. Gulf, the Mexican Gulf, 
and the Yucatan peninsula — they performed hypothesis tests on the difference in yearly landfall rates 
between the hot and cold years. In all regions but the U.S. Northeast, the landfall rate was higher in 
hot years, with the difference in the Yucatan being found as statistically significant. 

Their approach requires that one assume a particular theory about the relationship between SST and 
landfall rates. It also requires that the coastline be divided into somewhat arbitrary, large segments. It 
would be preferable to have the densities inform us as to the regions which are experiencing differences 
among the hot and cold years. For example, consider Figure 4, which shows the density estimate found 
when our methods arc applied separately to the tracks from cold years (Figure 4(a)) and the tracks from 
hot years (Figure 4(b)). If we focus on the regions in which the hot density is much higher than the cold 
density — specifically, the range Dl £ [2.4, 3], D2 e [—.9, —.4], found using the technique of Section 2.3 
— we can map the tracks that fall into that region, as shown in Figures 4(c) and 4(d). We see that most 
of these tracks are southern U.S. and Central American landfalling tracks, which comports with Hall 
and Jewson [13]. Density estimation allows us not only to test hypotheses about the effect of climatic 
predictors on TCs, but also provides a way of generating insight into the nature of these relationships. 

The above analysis illustrates how the methods described in this paper can be used to investigate 
questions about TC tracks, but it reduces SST to a binary variable: "hot year" or "cold year." These 
results treat a 56-year stretch of extreme climate events as samples from two distributions. For those 
with long-term interests, such as insurers or bodies that establish coastal building codes, this may be 
sufficient, as they are interested in the distribution of extreme events over the next 50 years, but many 
important questions concern short-term temporal and spatial variation in TC distribution. Thus, our 
primary interest in the study of TC tracks is not in estimating a single high-dimensional density, but 
instead to quantify the changes in the track-space density over time, and to model how these changes 
relate to other climate variables. 

In fact, SST data is available as a high-resolution time series, as a function of ocean position. We 
have begun to exploit these data when fitting models; the potential of such a model is illustrated in 
Figure 5. In this example, a three-dimensional diffusion map is created using 1000 TC tracks since 1900; 
this map is shown in the upper-left panel of the figure. Consider a point in three-dimensional diffusion 
space, zo = (0.39, 0.086, —0.0098). The upper-right panel of Figure 5 shows all of the tracks which are 
within a small diffusion distance (i.e. small Euclidean distance in diffusion space) of this point; these are 
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a cluster of storms which remain far from the Atlantic coast. The dashed line in the lower-left panel 
shows the change in the density of tracks near Zq over all of the years, smoothed over time. 

This panel also depicts the sea surface temperature at latitude/longitude (30W,15N) 2 , chosen because 
it is close to the genesis point of the storms shown in the upper- right panel of Figure 5. The two vertical 
lines correspond to important time points in the improvement of storm observations: first in 1945 when 
plane-based observations began, and second in 1966 when satellite-based tracking began [26]. It is evident 
that once the improved data from satellites became available, there is a close correspondence between 
SST and storm occurrence. We plan to exploit these, and more sophisticated, temporal and spatial 
relationships. In particular, our formal models for conditional density estimation (CDE) will incorporate 
SST (and similar variables) by transforming the available SST data from time series defined at each 
position on the ocean into time series defined at each position in diffusion space. There is a natural way 
of doing this, as each point in diffusion space corresponds to a track on the ocean: simply average SST 
over that track for each time point. The result will be a quantity which is localized both in space and 
time, and ready for direct comparison with the estimated TC density. 



3.1. Orthogonal Series Density Estimation 

The majority of work on methods for CDE has focused on the Nadaraya- Watson conditional kernel 
smoother first proposed by Rosenblatt [27], in which the conditional density is estimated as the ratio of 
the kernel density estimates for the joint density of the response and the predictors and the marginal 
density of the predictors [28, 29, 30, 31, 32]; however, that form of CDE is not as amenable to a high- 
dimensional response as orthogonal series conditional density estimation in which the basis is adapted 
to the data. 



2 Prom the Smith- Reynolds Extended Reconstructed Sea Surface Temperature Data Set [25]. 
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Figure 5: Upper left: diffusion map created using 
1000 Atlantic storm tracks, where color indicates 
the year of the storm. Upper right: tracks of storms 
which are close to (0.39, 0.086, -0.0098) in diffu- 
sion space. Lower left: plot of estimated density 
(rescaled) at point (0.39, 0.086, -0.0098) in diffu- 
sion space (the dashed line), compared with SST 
at (30W,15N) (the solid line). The two vertical 
lines indicate times at which the observations of 
storm tracks improved: in 1945 when plane-based 
tracking began, and 1966 when satellite-based ob- 
servation began. 



Orthogonal series density estimation is motivated by the decomposition of a density as fz(z) — 
YHtLa6ifi{ z ) f° r an orthonormal series {<pi : i € Z +0 }; the density is then estimated as fz{z) — 
Yli=o^iVi{ z )- Typically, noting that E{ipi{z)) = J (^2^L 9jipj(z)j ipi(z)dz — 9i, one estimates the 

Fourier coefficients as 9i = ~ X}j=i Vi( z j)- The density estimation problem then is reduced to one of 
choosing the k that achieves the best bias- variance tradeoff. 

In the conditional case, the series that we want to estimate is 

oo oo 

f{y\x) = y £^2,0 i ^ij(x,y). (11) 

i=0 j=0 

While this Fourier coefficient estimation scheme described above is trivially extended to the multi- 
variate case, it cannot be neatly ported to the conditional case as the data points (xi,yi) are not samples 
from the conditional density fyix but from the joint density fy,x- But Efromovich [33] illustrates how 
the conditional case can be cast as an expectation with respect to the joint density: 



fY\x(y\x)<Pi,j(x } y)dxdy = J J ^jr^^ <PiJ ( x > v) dxd V = E ( ^'fxfx^ ) ' ^ 



Thus the conditional Fourier coefficients can be estimated as 9i ., = - J~)2^ v ~ 3 ^ x ' v ' > 
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This is relatively straightforward, yet we have not yet addressed the high dimension of the response 
y. The dimension reduction in this approach comes from using the orthonormal basis provided by the 
diffusion map, as it is adapted to the intrinsic geometry of the data. Specifically, if we think of ipij(x, y) 
as a tensor-product basis bifurcated by the predictor and response - ipij(x,y) — <pi(x)Xj(y) - then the 
eigenfunctions estimated by the eigenvectors of the transition matrix P make a natural candidate for the 
response component of <p, i.e. X(y) = ip(y). 

In Richards et al. [7] , this idea was successfully applied to a high-dimensional linear regression model 
using diffusion maps, in which the estimated eigenfunctions were used in the orthogonal scries estimate of 
the regression function r(x). (This worked well, because the ipfe) are sample approximations to smooth 
basis functions supported on the data (see [3]), and the relationship between the response and covariates 
(after rcparamctcrization via the diffusion map) was sufficiently smooth. A more sophisticated basis may 
be required in this case due to the complex spatial variation in TCs; we are therefore also considering 
multi-scale bases based on a variation of the treelet expansion described in Lee et al. [34].) Girolami 
[35] employed a similar method for unconditional orthogonal series density estimation from kernel PCA; 
their basis is derived from the eigendecomposition of the Gram matrix. 

The methods outlined in this paper provide a promising framework to address important scientific 
questions regarding the behavior of tropical cyclones. Using an approach to SCA, dimensionality reduc- 
tion can be achieved without significant loss of important scientific information. The low-dimensional 
space yields a natural parameterization of the data, useful for constructing nonparametric density esti- 
mates and relating temporal and spatial variability in TCs to variations in other climate variables. In 
addition to the parameterization, the eigenvectors can be used as a basis for orthogonal series density 
estimation adapted to a high-dimensional setting. 
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